38 ◾ Bioinformatics
-p 80 \
-o filtered.fastq \
-Q33
fastqc filtered.fastq
firefox filtered_fastqc.html
In the above script, “-i” option specifies the input FASTQ file, “-q” specifies the minimum
Phred quality threshold, “-p” specifies the percentage of bases of the reads that have at least
the specified threshold quality, “-o” specifies a name of the output FASTQ file where the
filtered reads are stored, and “-Q33” is to tell the program that the FASTQ quality encod-
ing is Phred+33 (the default is “-Q64”; therefore, we must use “-Q33” for FASTQ files with
Illumina 1.9 encoding or later).
Figure 1.32 shows the per base sequence quality graph of the filtered FASTQ file. The
filtering process removed 499,970 reads, which did not meet the criteria. The per base
sequence quality, which is the most important metric, has been improved and per base
sequence content has been also improved. However, some positions at the ends of the reads
have still low Phred quality scores. We can trim the low-quality bases from the ends of the
reads by using the “fastq_quality_trimmer” program. Instead of removing the reads that
FIGURE 1.32 A graph of the filtered “bad.fastq” file with low-quality bases at the read ends.